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Abstract 

We  propose  an  agent-based  model  to  simulate  the  creation  of  street  gang  rivalries.  The  movement  dynamics 
of  agents  are  coupled  to  an  evolving  network  of  gang  rivalries,  which  is  determined  by  previous  interactions 
among  agents  in  the  system.  Basic  gang  data,  geographic  information,  and  behavioral  dynamics  suggested 
by  the  criminology  literature  are  integrated  into  the  model.  The  major  highways,  rivers,  and  the  locations  of 
gangs’  centers  of  activity  influence  the  agents’  motion.  We  use  a  policing  division  of  the  Los  Angeles  Police 
Department  as  a  case  study  to  test  our  model.  We  apply  common  metrics  from  graph  theory  to  analyze 
our  model,  comparing  networks  produced  by  our  simulations  and  an  instance  of  a  Geographical  Threshold 
Graph  to  the  existing  network  from  the  criminology  literature. 
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1.  Introduction 

Street  gangs  are  a  growing  problem  around  the  world  [13,  33,  32].  In  fact,  recent  statistics  from  The 
National  Gang  Intelligence  Center  estimate  there  are  1  million  active  gang  members  in  the  United  States 
alone  [55].  Violence  is  intrinsic  to  street  gangs,  and  rival  gangs  battle  to  gain  respect  and  street  reputation 
[68,  14].  Criminal  activities  perpetrated  by  gang  members,  including  armed  robbery,  homicide,  drug  dealing, 
and  auto  theft,  drain  cities  and  governments  of  tight  resources  and  also  pose  safety  threats  to  community 
members.  Much  of  the  research  on  street  gangs  has  been  conducted  within  the  United  States,  though  there 
have  been  some  efforts  to  understand  the  phenomenon  in  Europe  and  other  parts  of  the  world  [13,  33,  32]. 

Violence  perpetrated  by  gang  members  is  frequently  against  members  of  a  different  gang.  In  areas  with 
numerous  gangs,  it  is  common  for  gangs  to  have  multiple  violent  interactions  with  many  of  the  other  gangs. 
Further,  street  gang  members  typically  have  locations,  known  as  set  spaces ,  where  they  spend  large  quantities 
of  time  [69,  53].  It  is  therefore  reasonable  to  think  of  each  gang  as  a  node  embedded  in  Euclidean  space 
[56,  71].  Within  this  framework,  the  existence  of  persistent  violence  between  two  gangs  becomes  an  edge 
connecting  two  nodes.  From  this  construction,  one  can  view  a  collection  of  gangs  as  a  spatially  embedded 
network  [70].  The  Hollenbeck  policing  division  of  eastern  Los  Angeles  is  marked  by  a  particularly  high  degree 
of  violent  crimes  involving  gang  members,  including  homicides  and  aggravated  assaults  [56,  28] .  It  is  for  this 
reason  and  others  listed  in  Section  1.4  that  we  consider  Hollenbeck  as  a  case  study  for  our  model. 
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Statistical  approaches  are  often  used  to  analyze  gang  activities.  For  example,  in  their  recent  paper,  the 
authors  of  [56]  used  a  network  statistical  approach  called  CONCOR  to  partition  the  region  of  Hollenbeck 
into  areas  of  similar  violence  activity  and  geographic  proximity.  From  this,  they  were  able  to  conclude 
that  a  relationship  exists  between  the  location  of  a  gang  and  its  rivals.  Other  authors  have  utilized  various 
statistical  approaches  to  examine  gang  related  data  [71,  69].  The  methodology  of  determining  factors  that 
correlate  with  gang  activity  has  been  able  to  describe  certain  features  of  the  observed  system.  This  approach 
has  made  important  contributions  in  the  field,  but  it  cannot  make  solid  causal  arguments  or  test  theories 
[37,  74].  Though  one  could  argue  that  creating  an  experiment  would  help  validate  causal  arguments,  many 
of  these  experiments  are  not  feasible  due  to  monetary  and  resource  constraints  and  could  be  unethical  and 
infringe  on  basic  human  rights.  Even  in  circumstances  where  the  resources  are  available,  interventions  may 
fail  to  be  implemented  as  planned.  This  is  evident  in  the  work  of  [71]. 

Obtaining  complete  and  valid  data  sets  is  a  common  issue  in  the  field  of  criminology.  Data  sets  are  many 
times  unreliable  through  inaccuracies,  underreporting,  and  potential  bias  [16].  This  provides  a  fundamental 
problem  in  the  conclusions  made  from  faulty  data.  In  [25],  a  strong  case  is  made  to  move  beyond  statistical 
modeling  and  instead  model  social  phenomena  using  a  mathematical  approach.  We  use  such  an  approach 
with  the  aim  to  understand  the  plausible  mechanisms  for  the  formation  of  gang  rivalries  and  to  provide  social 
scientists  a  means  by  which  to  test  social  theories.  For  comparison,  we  propose  two  different  mathematical 
techniques:  a  pure  network  and  an  agent-based  approach. 

Our  goal  in  this  work  is  to  understand  how  long-term  rivalries  among  gangs  develop  by  using  a  model 
including  geography,  social  dynamics,  and  human  mobility  patterns.  Because  we  have  only  one  observed 
rivalry  network,  this  is  a  very  difficult  inverse  problem.  Furthermore,  statistical  approaches  are  not  able  to 
determine  causal  effects.  As  a  solution,  we  propose  an  agent-based  model  that  is  coupled  to  a  dynamically 
evolving  network.  This  bottom-up  approach  simulates  the  mobility  of  gang  members  by  using  the  conclusions 
from  current  literature  in  human  mobility  patterns,  see  Section  2.1  for  more  details.  These  agents  interact  to 
form  a  rivalry  network.  We  compare  the  resulting  simulated  network  to  the  gang  rivalry  network  observed 
in  the  eastern  Los  Angeles  division  of  Hollenbeck  [71,  56]. 

The  outline  of  our  paper  is  as  follows:  network  and  agent-based  modeling  approaches  are  described  in 
Sections  1.1  and  1.2,  and  previous  work  on  crime  models  is  discussed  in  Section  1.3.  Detailed  information 
concerning  the  particulars  of  Hollenbeck  are  found  in  Section  1.4.  In  Section  2,  we  outline  the  proposed 
model.  In  Section  3,  we  describe  two  baseline  models,  one  instance  of  a  Geographical  Threshold  Graph  and 
a  network  derived  from  Brownian  Motion.  We  contrast  these  simpler  models  with  our  model  to  demonstrate 
the  need  for  a  more  complex  approach.  In  Section  4,  we  describe  a  series  of  metrics  from  network  theory, 
examine  long  term  behavior  of  the  model,  and  compare  the  networks  against  the  metrics.  Section  5  provides 
a  sensitivity  analysis  of  our  model.  We  conclude  and  give  future  directions  in  Section  6. 

1.1.  Network  Models 

General  network  models  and  the  corresponding  analysis  are  useful  for  describing  the  behavior  of  complex 
systems  and  have  played  an  increasingly  active  role  [46,  47,  49].  One  way  networks  are  treated  in  the 
literature  is  by  analyzing  the  statistical  properties  of  a  given  network.  Another  approach  is  to  consider 
the  construction  of  a  network.  There  are  many  instances  where  the  network  of  interest  is  not  known,  but 
there  is  some  knowledge  of  the  processes  by  which  the  network  is  formed.  One  popular  method  to  construct 
a  network  is  to  view  it  as  a  random  graph.  Each  edge  is  added  with  a  predetermined  probability,  often 
dependent  on  the  weight  of  the  nodes  [1,  50,  51]. 

In  some  applications,  including  gang  rivalry  networks,  the  geographic  location  of  the  nodes  influences 
the  structure  of  the  network.  In  such  cases,  geographic  features  should  be  considered  as  part  of  the  random 
network  model.  For  example,  interstate  highways  have  been  shown  to  be  structurally  different  from  scale- 
free  networks  such  as  Internet  and  airline  flight  networks  [22].  The  importance  of  geography  is  also  seen 
in  friendship  networks  in  [77].  This  work  was  descriptive  by  nature,  and  was  therefore  limited  in  the 
conclusions  that  could  be  drawn.  In  their  paper,  Liben-Nowell  et  al.  use  the  publicly  accessible  location  of 
495,836  bloggers  in  the  Live  Journal  to  investigate  effects  of  spatial  proximity  on  friendships  [35].  The  study 
found  that  an  estimated  69%  of  a  person’s  friends  can  be  described  by  geography.  This  paper  constructed  a 
simulation  and  was  able  to  create  plausible  scenarios  and  test  theories  of  information  spreading  on  a  social 
network,  again  highlighting  a  strength  of  a  mathematical  modeling  approach. 
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In  the  scenario  of  gang  rivalries,  the  geographic  location  of  gangs  play  a  role  in  the  observed  gang  rivalry 
structure  [71].  Because  of  this,  one  of  our  approaches  is  to  construct  a  simple  network  model  that  incorporates 
the  proximities  between  gang  set  spaces.  One  method  for  incorporating  geographical  information  into  the 
random  graph  construction  is  by  using  a  Geographical  Threshold  Graph  [39,  8,  9].  This  is  a  random  graph 
on  a  set  of  randomly  weighted  nodes,  where  the  nodes  are  located  in  a  metric  space  and  the  connections 
are  determined  by  thresholding  a  function  of  the  distance  and  the  weights.  This  provides  an  efficient  way 
to  construct  a  rivalry  network  while  incorporating  some  geographic  information.  We  use  an  instance  of  a 
Geographical  Threshold  Graph  as  a  baseline  against  which  we  compare  our  model. 

1.2.  Agent-Based.  Models 

Network  models  can  provide  a  computationally  inexpensive  means  to  reproduce  an  observed  network, 
but  these  models  do  not  lend  themselves  to  describing  phenomena  of  interest  beyond  the  structure  of  the 
network.  An  alternative  modeling  approach  is  to  use  agent-based  models.  These  models  are  generally  used 
for  complex  systems,  since  they  are  able  to  capture  details  at  the  level  of  an  individual,  or  agent.  In  this 
class  of  models,  agents  often  move  through  phase  space  and  interact  amongst  themselves,  producing  complex 
dynamics  and  patterns  from  simple  behavioral  rules.  Agent-based  models  can  answer  different  questions  from 
a  network  approach  since  they  focus  on  the  way  individuals  actions  can  determine  the  behavior  at  the  system 
level. 

This  type  of  modeling  approach  is  widely  used  in  economics  [75,  67,  78],  epidemiology  [19],  sociology  [65], 
biology  [27],  and  other  situations  in  which  complex  systems  are  encountered.  The  strength  of  agent-based 
modeling  is  that  it  allows  for  responses  at  the  individual  level  to  be  directly  incorporated  into  the  model. 
For  example,  agent- based  models  have  been  proposed  in  the  context  of  searching  and  swarming  [36],  and 
much  interesting  mathematical  analysis  has  been  done  on  the  behavior  of  these  systems  [15,  12].  In  this  way, 
modelers  can  include  information  and  behavioral  dynamics  from  scientists  who  study  the  complex  system  of 
interest.  Agent-based  methods  have  been  proposed  to  describe  social  systems  and  economics  since  they  can 
provide  a  means  by  which  to  test  theories  about  individual  dynamics  in  cases  where  the  dynamics  are  not 
precisely  known  [20,  37]. 

In  our  model,  we  are  interested  in  the  coupling  between  the  network  and  the  underlying  system.  There 
has  been  some  exploration  of  this  in  the  literature.  For  example,  Schweitzer  and  Tilch  provide  one  example  of 
a  model  that  uses  an  agent-based  approach  to  form  an  emerging  network  [58,  59].  They  model  the  chemical 
trail  formed  by  ants  searching  for  food  at  an  unknown  location.  As  the  ants  search  their  environment, 
networks  of  chemical  trails  form  with  which  the  ants  interact.  Another  example  is  that  of  the  EpiSims  model 
[73,  41].  Here,  the  contact  networks  of  the  populations  are  evolving  over  time  and  depend  on  the  internal 
attributes  of  the  people  in  the  population.  In  turn,  as  a  disease  is  spread  through  the  contact  network,  the 
movements  of  the  people  change  in  response  to  the  disease,  producing  a  non-trivial  interaction  between  the 
system  and  the  network. 

One  of  the  major  strengths  of  this  approach  is  the  flexible  framework  available  for  these  models.  For 
example,  this  method  can  easily  incorporate  environmental  and  spatial  information  inherent  to  the  system. 
One  example  of  this  is  shown  in  [4],  where  the  authors  were  able  to  incorporate  environmental  information 
in  the  form  of  temperature  and  current,  as  well  as  geography  of  landmasses,  to  accurately  model  and  predict 
the  migration  patterns  of  a  species  of  pelagic  fish.  In  our  agent-based  model,  we  use  information  about  the 
environment  in  the  form  of  freeways,  rivers,  and  road  density.  In  a  network  context,  coupling  an  agent-based 
model  to  a  network  is  a  novel  way  to  explore  how  changing  dynamics  of  individual  agents  can  affect  the 
evolution  of  the  network,  providing  control  parameters  which  would  be  inaccessible  in  a  graph-based  model. 

1.3.  Previous  Work  on  Crime  Modeling 

Many  mathematical  models  have  been  created  to  study  the  patterns,  mechanisms,  and  potential  inter¬ 
ventions  associated  with  crime.  Research  has  been  conducted  to  address  various  aspects  of  burglaries  such 
as  hot  spot  formation  [61,  63,  10],  policing  strategies  [31,  54],  criminal  cooperation  [62],  and  geographical 
profiling  [52].  Some  attempts  have  also  been  made  to  model  gang  behaviors.  The  study  in  [42]  examines 
the  short  term  retaliatory  behavior  of  the  rivalries  based  on  between  gang  violence  data  from  the  LAPD.  In 
this  work,  each  violent  event  between  two  gangs  is  considered  an  instance  of  a  point  process  associated  with 
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that  pair  of  gangs.  The  intensity  of  the  rivalry  depends  directly  on  the  network  of  unidirectional  violent  in¬ 
teractions.  This  provides  a  top-down  approach  to  understanding  immediate  consequences  of  violence  among 
gangs  within  a  system. 

In  the  model  proposed  by  [17],  an  agent-based  approach  was  used  to  simulate  the  location  of  violent 
interactions  and  gang  retaliations  in  Hollenbeck.  Embedded  in  the  model  was  a  rivalry  network.  Though 
the  model  recreated  similar  features  to  the  violence  data,  the  model  made  unrealistic  assumptions  about  the 
mobility  of  the  agents.  For  instance,  the  agents’  movements  were  influenced  by  the  location  of  every  rival 
gang  member,  an  implausible  assumption.  Further,  agents  only  move  towards  rival  gangs  members,  ignoring 
their  own  set  space,  or  center  of  activity.  However,  according  to  the  criminology  literature,  this  retaliatory 
behavior  is  only  seen  on  short  time  scales  [42].  Furthermore,  gang  members  tend  to  avoid  the  territory  of 
rival  gangs  and  spend  large  quantities  of  time  at  their  set  spaces  [2,  34].  Another  concern  is  that  the  the 
agents  ignore  geographic  features,  such  as  highways  and  rivers,  known  to  correlate  with  the  rivalry  structure 
[56].  It  has  been  shown  in  other  models  simulating  human  movement  that  highways  restrict  movements  [26]. 

We  propose  a  bottom-up  approach  with  an  agent-based  model  that  incorporates  movement  rules  from 
literature  on  human  mobility  in  order  to  capture  the  long-term  gang  rivalry  structure.  These  rules  consider 
geographical  features  known  to  relate  to  movement  dynamics.  We  also  consider  current  literature  on  known 
gang  behavior  as  a  basis  for  directional  decisions.  More  details  are  seen  in  Subsection  2.1.  We  use  Hollenbeck 
as  a  case  study  for  our  model. 

1.  Hollenbeck 

Hollenbeck  is  a  policing  division  located  in  eastern  Los  Angeles,  surrounded  by  downtown  Los  Angeles  to 
the  west,  Pasadena  to  the  northeast,  Vernon  to  the  south,  and  to  the  east  the  unincorporated  area  of  East 
Los  Angeles,  see  Figure  1.  Hollenbeck  provides  a  diverse  geography  with  many  highways  cutting  through  the 
region  and  is  bounded  by  the  Los  Angeles  River.  It  encompasses  an  area  of  roughly  39.4  km2.  Hollenbeck 
is  home  to  approximately  twenty- nine  active  gangs  with  sixty- nine  rivalries  among  them  [56,  71].  The  set 
spaces  for  the  gangs  and  the  corresponding  observed  rivalry  network  are  displayed  in  Figure  1,  as  given  in 
[56]. 

Certain  properties  of  Hollenbeck  make  it  accessible  to  modeling  the  gang  rivalry  networks  outlined  in 
[71,  56].  First,  it  is  a  closed  system  in  that  the  gang  activity  within  Hollenbeck  is  generally  isolated  from  gang 
activity  outside  of  Hollenbeck.  Further,  the  motivation  for  violence  between  gangs  is  largely  characterized 
by  disputes  over  geographical  gang  territories,  as  opposed  to  drug  and  racially  motivated  violence.  Data  on 
the  geography  of  Hollenbeck  is  easily  accessible,  and  there  has  been  explicit  documentation  of  the  observed 
rivalry  network. 

2.  Our  Model 

Our  objective  is  to  model  the  long-term  gang  rivalry  structure  and  gang  member  mobility  by  incorporating 
simple  behavioral  rules  and  geographical  factors,  such  as  road  density,  highways,  and  locations  of  gangs’  set 
spaces  (centers  of  activity).  The  movement  of  each  agent  is  not  intended  to  model  each  detail  involved  in  an 
individual’s  mobility,  but  rather  capture  the  statistical  behavior  of  people’s  movements  observed  from  the 
literature.  Agents  in  the  model  move  based  on  their  location  with  respect  to  their  and  other  gangs’  set  spaces 
and  interact  with  agents  of  different  gang  affiliations.  We  count  the  number  of  interactions  between  gangs, 
and  when  agents  of  different  gangs  move  within  a  certain  distance  of  each  other,  the  number  of  interactions 
between  those  gangs  increases  by  one.  As  the  simulation  progresses,  a  network  structure  emerges.  The 
weighted  network  of  interactions  in  turn  influences  the  directional  decisions  of  the  agents. 

2.1.  Motivation  for  model  construction 

The  intent  of  this  model  is  to  capture  the  broad  statistical  features  of  human  mobility  with  an  emphasis 
on  gang  members’  movements.  Empirical  data  on  the  location  and  individual  movements  of  each  gang 
member  is  inaccessible,  so  we  characterize  the  movements  of  the  individual  gang  members  in  a  statistical 
sense  based  on  the  literature  on  human  mobility.  Several  studies  give  compelling  evidence  that  when  people 
move  unconstrained  environment,  the  jump  lengths  between  movements  is  distributed  like  a  power  law 
[11,  57].  Further,  in  the  presence  of  obstacles  such  as  roads  and  buildings,  the  jump  lengths  more  accurately 
follow  a  bounded  power  law  distribution  [23]. 
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Figure  1:  Google  Earth™  Image  of  the  Hollenbeck  area  (left).  Map  of  the  Hollenbeck  area  with  the  location  of  the  gang  set 
spaces,  or  centers  of  a  gang’s  activity,  and  the  corresponding  rivalry  network  approximated  by  [56],  where  a  node  of  the  network 
represents  a  set  space,  and  an  edge  represents  a  rivalry  between  two  gangs  (right).  Major  roads,  highways,  the  Los  Angeles 
river,  and  division  lines  are  also  seen  in  both  images. 


Determining  the  statistical  properties  of  the  jump  length  is  only  one  aspect  of  movement  dynamics.  In 
their  paper,  Rhee  et  al.  discuss  the  need  to  incorporate  geographical  features  and  the  tendency  for  people 
to  go  home  [57].  Gonzalez  et  al.  confirmed  in  their  data  that  humans  do  tend  to  frequent  a  small  number 
of  locations  often  [23].  For  these  reasons,  the  agents  in  our  model  pick  their  jump  length  from  a  Bounded 
Pareto  distribution  and  have  a  directional  choice  in  movement. 

In  the  case  of  gangs  in  Hollenbeck,  it  is  reasonable  to  assume  that  the  gang  members  have  a  clear  sense  of 
the  location  of  their  home  territory,  or  set  space,  as  well  as  the  location  of  their  rival  gangs’  set  spaces  [69]. 
Literature  on  gang  activity  suggests  that,  in  general,  gang  members  tend  to  stay  away  from  their  rival  gangs’ 
set  spaces  [34] .  Unlike  other  criminal  groups,  such  as  organized  crime  syndicates  and  insurgency  groups  that 
strive  for  secrecy,  street  gangs  are  social  organizations  that  proudly  demarcate  their  territory  and  announce 
their  enemies  through  the  use  of  graffiti.  Gangs  create  social  boundaries  and  therefore  areas  of  avoidance 
[2] .  Our  model  incorporates  this  social  geography  into  agents’  movement  dynamics. 

One  aspect  of  modeling  human  mobility  that  was  touched  on,  but  not  fully  explored,  by  the  previous 
literature  is  the  role  of  physical  features  specific  to  urban  areas  that  may  constrain  agents’  movement.  The 
first  consideration  is  the  ease  with  which  an  agent  can  move  through  a  city.  We  posit  that  in  areas  where 
there  is  a  dense  street  network,  the  likelihood  of  an  agent  to  move  long  distances  is  small  due  to  such  obstacles 
as  the  high  density  of  people  and  cars,  as  well  as  traffic  lights.  On  the  other  hand,  areas  where  the  road 
density  is  lower,  agents  should  be  able  to  move  longer  distances.  A  second  physical  consideration  that  affects 
human  mobility  in  a  city  are  the  highway  systems  and  rivers  that  can  cut  across  the  region.  These  features 
are  not  impassible,  in  that  there  are  underpasses  and  bridges.  Simulations  have  shown  that  they  provide  an 
obstacle  to  mobility  [26].  It  has  been  posited  that  these  features  play  a  role  in  gang  rivalry  networks  [56]. 
Therefore  in  our  model,  we  view  major  roads,  highways,  and  rivers  as  semi-permeable  boundaries  that  affect 
the  agents’  movements. 

2.2.  Model  Details 
2.2.1.  Agents 

The  agents  of  this  model  are  gang  members  in  a  city.  Each  agent  is  associated  with  exactly  one  gang. 
For  simplicity  we  assume  agents’  directional  choice  is  dictated  only  by  the  location  of  the  gang  set  spaces, 
or  centers  of  activity.  All  agents  know  the  location  of  their  home  and  rivals’  set  spaces.  We  divide  the  city 
into  regions  based  on  geographical  boundaries,  such  as  rivers  and  highways.  An  agent  knows  which  region 
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it  is  currently  in  as  well  as  the  region  of  any  prospective  new  locations.  Embedded  in  the  city  is  a  rivalry 
network  among  the  gangs  in  the  city.  This  network  is  encoded  in  the  rivalry  matrix  R.  Each  element  (i,  j) 
of  R  corresponds  to  the  number  of  interactions  between  gang  i  and  gang  j.  When  two  agents  are  within 
interaction  range,  we  consider  them  to  have  interacted  and  the  corresponding  element  of  the  rivalry  matrix, 
R,  is  updated.  Refer  to  Section  2.2.3  for  details  on  R. 


2.2.2.  Environment 

The  environment  of  interest  is  on  the  scale  of  a  small  city.  Agents  and  gang  set  spaces,  or  centers  of 
gang  activity,  have  a  coordinate  location  in  free  space.  The  rest  of  the  physical  features  of  the  city  are 
approximated  by  an  NxM  grid.  Each  point  in  free  space  is  identified  with  the  nearest  grid  element.  The  size 
and  number  of  grid  elements  are  constant  throughout  the  simulation  and  are  limited  by  the  available  data 
and  the  memory  of  the  computer. 

Two  specific  features  encoded  by  this  grid  are  the  road  density  and  semi-permeable  boundaries  represented 
by  a  region  map.  Each  grid  element  of  the  road  density  contains  a  number  between  0  and  1.  A  value  of  0 
implies  a  low  road  density  whereas  a  value  of  1  implies  high  road  density.  The  semi-permeable  boundaries, 
corresponding  to  such  objects  as  highways  and  rivers,  are  assumed  to  split  the  environment  into  distinct 
regions.  We  pair  this  region  grid  with  a  transition  matrix  that  stores  the  associated  probability  of  an  agent 
to  cross  from  one  region  to  another.  These  probabilities  are  determined  at  the  start  and  remain  constant 
throughout  the  simulation.  This  is  implemented  to  discourage  agents  from  crossing  freeway  boundaries. 


2.2.3.  Rivalries 

The  network  structure  of  the  rivalries  is  encoded  in  a  weighted  adjacency  matrix,  R.  Each  element  Rij 
contains  the  current  history  of  interactions  between  gang  i  and  gang  j.  At  the  end  of  a  simulation,  we 
construct  a  thresholded  rivalry  graph  where  an  edge  between  gang  i  and  j  exists  if  either  pi(j)  or  Pj(i)  is 
larger  than  a  given  threshold  T,  where 


Piti) 


Rij 

^2k=  1  Rik 


and 


Pj(i) 


Rji 


(1) 


The  quantity  Pi(j)  represents  the  proportion  of  gang  *’s  interactions  which  have  occurred  with  gang  j.  Note 
that  pi(j)  is  not  necessarily  equal  to  Pj(i)',  however,  this  thresholding  yields  a  bidirectional  network  or, 
equivalently,  a  symmetric  adjacency  matrix.  Although  the  final  rivalry  network  is  symmetric,  the  influence 
of  the  rivalry  matrix  Rij  on  the  agents’  movements  during  the  course  of  the  simulations  are  not  symmetric. 
The  data  available  to  us  was  in  the  form  of  a  symmetric  network,  and  so  we  chose  the  thresholding  rule 
to  be  comparable  with  the  data.  Other  thresholding  rules  that  result  in  asymmetric  networks  could  be 
implemented.  For  example,  if  the  thresholding  rule  for  gang  i  solely  depended  on  the  quantity  Pi(j)  >  T, 
then  the  resulting  network  could  be  asymmetric. 


2.3.  Process  overview  and  Scheduling 

At  each  iteration  an  agent  is  chosen  from  the  set  of  all  agents  with  equal  probability.  The  selected  agent 
performs  one  step  of  a  random  walk  by  choosing  a  jump  length  and  direction  from  probability  distributions 
for  its  new  prospective  location.  Depending  on  the  distributions  for  the  jump  length  and  direction,  different 
random  walks  will  occur.  The  literature  on  human  mobility  suggests  humans  move  according  to  a  truncated 
Levy  walk,  motivating  our  model  selection.  Further,  it  is  unreasonable  to  assume  that  an  person’s  direction 
of  movement  is  solely  determined  by  set  space  locations.  Therefore,  we  use  a  statistical  distribution  to 
simulate  an  agent’s  directional  choice.  See  Section  2.1  for  more  motivational  details. 

The  jump  length,  x ,  is  chosen  from  the  Bounded  Pareto  probability  density  function, 

kxk  x~k~^ 

P(x,  k ,  XfYi ,  xm )  ■  ~~r  k  >  0  ,  xm  P.  P.  % m  ^0.  (2) 

1  _ 

\XM  J 

The  minimum  and  maximum  jump  lengths,  xm  and  xm  respectively,  provide  the  bounds  for  the  jump 
length  x.  The  scaling  parameter,  fc,  determines  how  quickly  the  probability  density  function  decays  from  the 
minimum  jump  length  to  the  maximum  jump  length.  For  all  agents  the  minimum  jump  length,  xm ,  and  scale, 
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k,  are  fixed  parameters.  To  determine  the  maximum  jump  length,  xm,  the  agent  uses  the  approximated 
road  density  of  the  agent’s  corresponding  location  from  the  environment  grid,  with  xm  ranging  from  a  to 
A.  These  values  remain  constant  throughout  the  simulation.  Here,  A  is  called  the  largest  maximum  jump 
length  and  a  is  the  smallest  maximum  jump  length.  Then  xm  is  calculated  via 


xm  =  (1  —  5)  ■  A  +  a, 


(3) 


where  the  road  density  at  the  agent’s  location,  5,  is  between  0  and  1. 

The  second  quantity  needed  to  move  the  agent  to  the  next  location  is  the  new  direction,  6.  This  is 
obtained  by  constructing  a  deterministic  direction  of  bias.  This  bias  incorporates  the  agent’s  location  with 
respect  to  its  home  set  space  and  the  location  of  its  rival  gangs’  set  spaces.  Agents  have  a  higher  probability 
of  moving  in  this  direction.  However,  with  lower  probability,  they  have  the  ability  to  move  any  direction. 
To  account  for  this  mobility  pattern,  the  von  Mises  distribution  is  used  to  simulate  the  direction  the  agent 
will  move,  6. 

More  specifically,  given  an  agent  in  gang  i,  the  bias  direction,  Hi ,  is  defined  as 


<»,*>“ -Hi (IIGllb) 


+  ^2  Dij(\\Gj\\2) 


(4) 


_i  ( z 

Hi  =  tan  — 

\y 

Here,  Gi  is  the  vector  that  points  to  the  set  space  of  gang  l  from  the  location  of  the  agent.  When  l  =  i,  this 
vector  points  towards  the  agent’s  home  set  space,  and  when  l  =/=■  i,  it  points  towards  a  different  gang’s  set 
space.  This  concept  is  shown  in  the  cartoon  example  in  Figure  2. 

In  Equation  4,  Hi  gives  the  rules  for  weighting  towards  a  gang  member’s  own  home  set  space.  The 
weightings  toward  or  away  from  different  gangs’  set  spaces  are  determined  by  D,j .  Our  Hi  and  Dij  take  the 
following  form: 

Hi(\&\\2)  =  hi  IIG^Ha,  (5) 


—  wij  (-R)  ..-sp...  •  (6) 

llGil|2 

One  notable  feature  about  these  equations  is  that  -ffi(-)  is  large  when  an  agent  in  gang  i  is  far  from  his  or  her 
gang’s  set  space,  but  the  Dij(-)  function  is  large  when  the  agent  is  close  to  a  rival  gang  j’s  set  space.  The 
factors  hi  and  Wij(R)  of  the  weighting  functions  are  chosen  according  to  the  rules  for  agent  movement.  In 
our  implementation,  the  factor  Wij  ( R )  depends  on  the  current  state  of  the  rivalry  network.  Negative  values 
of  these  functions  result  in  repulsion  and  positive  values  result  in  attraction. 


Figure  2:  Cartoon  example  of  the  direction  vectors  incorporated  in  the  direction  of  bias  formula,  Equation  4.  The  agent  in  this 
example  is  located  at  the  dot.  Here  G 1,  G2,  G3,  and  G4  show  the  vectors  pointing  toward  the  set  spaces  of  gangs  1  through 
4,  respectively.  Depending  on  the  choices  of  Hi  and  Dij ,  different  movement  dynamics  are  possible. 
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After  determining  the  direction  of  bias  from  Equation  4,  we  must  choose  in  which  direction  the  agent 
will  move.  The  direction,  9,  is  drawn  from  a  von  Mises  distribution  (also  known  as  the  Circular  Normal 
distribution)  [38,  29,  6].  For  9  €  [ — 7r,  7r],  the  von  Mises  distribution  is  given  by 


/(%>«) 


exp  (kcos  {9  —  p)) 
2itI0(k) 


Here  Iq  is  a  modified  Bessel  function  of  order  zero.  The  von  Mises  distribution  requires  two  parameters,  one 
for  the  angle  of  bias,  p,  and  one  for  the  strength  of  the  bias,  k.  We  can  think  of  p  as  being  the  mean  of 
the  distribution,  and  -  as  being  comparable  to  the  variance.  The  larger  k  is,  the  stronger  the  bias  is  for  the 
direction  p.  If  k  =  0,  this  is  a  uniform  distribution  on  a  circle. 

From  the  direction,  9,  and  jump  length,  x,  a  prospective  location  is  calculated.  The  new  location  is  then 
checked  to  see  if  the  result  would  move  the  agent  into  a  different  region.  Within  the  same  region  movement 
is  unrestricted.  However,  if  its  next  move  would  result  in  a  region  change,  i.e.  it  is  crossing  a  semi-permeable 
boundary,  it  has  a  given  probability  of  crossing  into  that  region.  If  the  agent  moves,  it  searches  the  other 
agents  to  see  if  it  is  close  enough  to  interact  with  agents  of  other  gangs.  When  an  interaction  does  occur, 
the  rivalry  matrix,  R,  is  updated.  The  location  of  interactions  is  also  recorded  and  could  be  of  interest  to 
other  applications,  see  discussion  in  Section  6  and  Figure  14. 

The  model  is  run  until  limiting  behavior  is  observed  in  all  of  the  metrics.  In  the  absence  of  an  observed 
network,  it  could  be  helpful  to  run  simultaneous  simulations  with  different  random  seeds  and  calculate  the 
variance  of  each  metric  over  the  course  of  the  simulation  run.  When  the  variances  of  each  metric  levels  off, 
terminate  the  simulation.  In  the  case  of  Hollenbeck,  the  final  network  is  taken  after  20,000,000  iterations 
and  then  thresholded  to  ignore  infrequent  interactions.  For  more  details  on  long  term  behavior  of  the  model, 
see  Section  4.2. 


2-4-  Initialization  and  Input  Data 

Before  the  simulation  begins,  the  region  map  and  an  estimated  density  of  the  road  networks  must  be 
provided  in  matrix  form  on  the  same  grid.  The  probability  of  crossing  each  boundary  must  also  be  provided. 
Additionally,  parameter  values  must  be  specified.  Table  1  describes  the  full  list  of  parameters  needed  for 
implementation.  At  the  start  of  the  simulation  all  of  the  agents  are  located  at  their  gang’s  set  space.  The 
size  of  each  gang  must  also  be  specified. 


Parameters 

Acceptable 

Values 

Hollenbeck 

Values 

Tested 

Range 

Description 

Agent 

k 

K 

hi 

Wij(R) 

0  <  xm  <  a 

0  <  k 

0  <  K 

— oo  <  hi  <  oo 
— oo  <  Wij(R)  <  oo 

0.1 

1.1 

3.5 

1 

- Pi(j ) 

[1,  1-9] 

[1.5,  5] 

Minimum  jump  length 

Bounded  Pareto  scaling  parameter 
Von  Mises  scaling  parameter 

Home  weighting 

Rival  gang  weighting 

Environment 

Ni 

Si 

A 

a 

B 

Ni  €  Z+ 

Si  £  R2 
a  <  A 

xm  <  a  <  A 

0  <  B  <  1 

14<  Ni  <  598 
see  Figure  1 
200 

100 

0.2 

[100,  400] 
[100,  200] 
[0,  .5] 

Number  of  gang  members  in  gang  i 
Location  of  gang  i  set  space 

Largest  maximum  jump  length 
Smallest  maximum  jump  length 
Permeability  of  boundaries 

Network 

T 

0  <  T 

0.04 

[0,  0.06] 

Threshold  for  existence  of  an  edge 

Table  1:  Parameters  needed  for  model  implementation  are  listed  in  the  first  column.  The  second  column  lists  theoretically 
acceptable  parameter  values.  The  values  corresponding  to  the  SBLN  are  displayed  in  the  Hollenbeck  Values  Column.  The 
Tested  Range  column  provides  the  range  for  each  variable  for  simulations  run.  The  last  column  provides  a  description  of  each 
of  the  parameter  values. 


2.5.  Hollenbeck  parameters 

The  grid  of  environment  features  of  Hollenbeck  was  approximated  from  the  Google  Earth™  image 
in  Figure  1.  Hollenbeck  is  about  39.4  km2  [56,  71].  In  our  implementation,  one  Hollenbeck  city  block 
corresponds  to  approximately  six  grid  elements.  The  interaction  radius  between  agents  is  3  units,  or  roughly 
half  a  city  block.  The  approximated  road  density  and  region  grids  are  show  in  Figure  3.  The  boundaries 
of  the  Hollenbeck  region  were  approximated  using  points  from  the  geographic  features  visible  from  Google 
Earth™.  These  boundaries  were  used  to  construct  the  region  grid.  To  approximate  the  road  density, 
the  Weighted  H1  Maximum  Penalized  Likelihood  Estimation  method  was  used  [66].  Although  the  area  of 
Hollenbeck  does  not  have  large  invalid  regions  of  agent  movement,  alternative  cities  could  have  regions  where 
human  mobility  is  not  expected  to  occur,  such  as  in  lakes,  mountains,  and  oceans.  Other  methods  for  density 
estimation,  such  as  kernel  density  estimation  or  other  Maximum  Penalized  Likelihood  Estimation  (MPLE) 
methods,  could  also  be  used  to  construct  the  road  density  [64,  18,  43,  24].  To  extend  the  approximated 
road  density  to  the  same  sized  grid  as  the  region  grid,  the  average  value  of  the  density  over  Hollenbeck 
was  computed  and  used  for  the  extended  regions.  The  number  of  agents  in  each  gang  reflects  historical 
information  obtained  from  the  LAPD. 


Figure  3:  The  image  on  the  left  shows  the  location  of  Hollenbeck  in  the  N  X  M  environment  grid.  The  semi-permeable 
boundaries  encoded  in  the  model  are  displayed  in  the  center  image.  The  shades  of  gray  of  this  image  are  used  to  distinguish 
among  regions.  On  the  right,  we  used  a  Weighted  H1  Maximum  Penalized  Likelihood  Estimation  method  with  a  road  map  as 
the  initial  data  to  approximate  the  road  density  of  Hollenbeck  [66].  The  scale,  seen  on  the  far  right,  gives  the  approximated 
road  density  intensity.  Light  shades  of  gray  correspond  to  high  density  values  near  one  and  dark  shades  correspond  to  low 
densities  near  zero. 

The  boundary  crossing  probability  between  the  regions  was  calculated  by  the  minimum  number  of  bound¬ 
aries  one  must  cross  to  get  from  one  region  to  the  next.  For  instance,  if  region  1  and  region  2  were  separated 
by  one  boundary,  the  agent  would  have  a  probability,  B,  of  accepting  a  move  from  region  1  to  region  2.  If 
region  1  and  2  were  separated  by  a  boundaries,  then  the  agent  would  have  a  Ba  probability  accepting  the 
move. 


3.  Baseline  Comparison  Models 


3.1.  Geographical  Threshold  Graphs  (GTG) 

For  comparison  to  the  networks  produced  by  our  simulations,  we  constructed  an  instance  of  a  Geograph¬ 
ical  Threshold  Graph  (GTG).  Geographical  Threshold  Graphs  are  random  graphs  that  use  spatial  proximity 
to  assist  in  determining  whether  or  not  two  nodes  are  connected  with  an  edge  [39,  8,  9].  Geographical 
Threshold  Graphs  randomly  assign  weights  rp  to  the  N  nodes.  Then,  using  an  interaction  function  F(r]i,rij), 
an  edge  between  nodes  rii  and  rij  exists  only  if 


d{ni,rijY 


>  Threshold, 


where  d(ru,  rij )  is  the  distance  between  nodes  rii  and  rij.  Constructing  an  instance  of  this  graph  is  fast  and 
computationally  inexpensive. 

Geographical  Threshold  Graphs  are  not  deterministic  in  general.  However,  we  are  using  this  framework 
to  construct  one  network  in  order  obtain  a  reasonable  comparison  to  our  proposed  model.  To  this  end, 
we  provide  deterministic  weights,  rji,  taken  from  data  to  be  the  size  of  each  gang.  In  our  case,  we  take 
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the  multiplicative  weight  function  F(rji,r]j)  =  rp  ■  r/j ,  since  this  is  the  number  of  possible  pairings  between 
members  of  gang  i  and  gang  j.  We  use  Euclidean  distance  for  the  d(n,,  rij)  function  and  choose  /?  =  2.  The 
threshold  was  chosen  to  give  the  same  number  of  rivalries  as  the  observed  rivalry  network. 

3.2.  Brownian  Motion  Network  (BMN) 

Another  model  we  use  to  compare  with  the  simulated  network  is  a  simplified  version  of  the  proposed  model 
using  Brownian  Motion  and  unbiased  movement  rules.  Experiments  without  encoding  the  semi-permeable 
boundaries  were  conducted  with  unsatisfactory  results.  Therefore,  the  semi-permeable  boundaries  of  the 
model  are  incorporated.  In  this  model,  each  agent  chooses  the  next  prospective  location  from  a  standard 
normal  distribution,  ignoring  any  directional  decisions.  These  simplifications  reduce  the  number  of  variables 
to  the  threshold,  T,  and  the  permeability,  B,  while  still  incorporating  the  geographic  boundaries.  The 
parameter  space  around  the  Hollenbeck  values  was  explored  and  run  for  2-10'  iterations.  A  priori,  it  was 
unclear  how  many  iterations  to  run  the  simulation.  We  observed  that  the  accuracy  of  the  Brownian  Motion 
networks  peaked  around  1.2-10'  iterations  and  then  decreased  as  the  simulations  progressed.  The  parameters 
and  number  of  iterations  that  produced  the  highest  accuracy  were  used  for  analysis.  We  will  refer  to  the 
resulting  network  as  the  Brownian  Motion  Network  (BMN). 

Inherent  in  the  BMN  is  a  level  of  stochasticity.  To  understand  how  this  stochasticity  influences  the  final 
rivalry  network  and  the  resulting  metrics,  the  BMN  simulation  was  run  for  100  different  seed  values.  The 
resulting  collection  of  final  networks  will  be  called  the  Ensemble  BMN. 

3.3.  Baseline  Network  Graphs 

Figure  4  displays  the  resulting  GTG  and  BMN  as  compared  to  the  observed  rivalry  network.  The  lower 
portion  of  the  GTG  graph  has  similar  shape  to  the  observed  network,  but  contains  more  connections.  The 
GTG  does  not  make  long  connections.  This  is  particularly  evident  in  the  upper  half  of  Hollenbeck.  The 
BMN  picks  up  many  of  the  longer  connections,  but  includes  far  too  many  connections. 


Figure  4:  A  visual  comparison  of  the  observed  rivalry  network  (left),  GTG  (center),  and  BMN  (right).  Here,  a  node  of  the 
network  represents  a  set  space,  and  an  edge  represents  a  rivalry  between  two  gangs. 


4.  Results 

The  proposed  model  produces  strong  results  when  compared  to  the  baseline  models.  For  analysis  of  the 
model,  this  section  is  divided  into  two  parts,  the  internal  properties  of  the  model  and  the  comparison  among 
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models.  Due  to  the  stochastic  nature  of  the  movement  rules,  the  final  network  is  not  deterministic.  Despite 
fluctuations  among  simulation  runs,  within  a  single  run  the  model  exhibits  long  term  stable  behavior  in  the 
metrics  used  for  analysis.  The  stochasticity  and  long  term  behavior  are  the  internal  properties  we  examine 
in  detail  in  Subsection  4.1  and  Subsection  4.2,  respectively.  The  stochastic  nature  of  the  model  allows  for 
a  more  realistic  scenario,  in  the  case  where  the  observed  network  is  just  one  instance  of  a  random  process. 
Further,  the  existence  of  stable  long  term  behavior  in  our  model  is  important  to  replicating  the  observed 
system,  since  research  has  demonstrated  that  the  rivalry  networks  among  gangs  tend  to  be  stable  over  time 
[70,  56,  53].  A  more  detailed  discussion  of  this  can  be  seen  at  the  end  of  Subsection  4.2. 

After  an  examination  of  the  internal  properties  of  the  model,  we  subject  the  models  to  a  number  of 
metrics  in  order  to  capture  the  features  of  the  observed  network  as  well  as  the  accuracy  of  model  networks. 
Keeping  in  mind  the  potential  for  variants  of  an  observed  network,  the  measures  we  chose  to  evaluate  the 
performance  of  our  model  are  fairly  robust  to  small  perturbations  to  the  observed  network.  The  metrics  used 
to  assess  the  shape  of  the  network  and  the  accuracy  are  defined  in  Subsection  4.3,  and  their  corresponding 
results  are  displayed  in  Subsection  4.4. 

For  analysis  and  comparison,  we  took  one  simulation  run  as  a  showcase  of  the  model.  This  network 
was  obtained  by  searching  the  parameter  space  within  the  ranges  specified  in  the  fourth  column  of  Table  1 , 
allowing  for  dependencies  between  parameters.  The  34,128  simulated  networks  were  then  sorted  according 
to  accuracy,  defined  in  Equation  7.  Because  each  of  the  gangs  in  Flollenbeck  are  active,  the  graph  with 
the  highest  accuracy  with  all  non-zero  degree  nodes  was  chosen.  The  parameter  values  for  the  optimal  run 
are  found  in  the  third  column  of  Table  1.  We  will  to  refer  to  this  as  Simulated  Biased  Levy  walk  Network 
(SBLN).  Figure  5  displays  the  network  with  our  optimal  parameters.  The  SBLN  has  a  shape  and  structure 
similar  to  the  observed  network,  but  does  not  capture  all  of  the  longer  edges.  We  also  verified  that  all  of  the 
metrics  we  use  to  evaluate  our  model  have  reached  a  statistical  equilibrium  for  the  SBLN. 


Figure  5:  Comparison  of  the  observed  rivalry  network  (left)  and  the  SBLN  (right).  Here,  a  node  of  the  network  represents  a 
set  space,  and  an  edge  represents  a  rivalry  between  two  gangs.  The  SBLN  has  a  shape  and  structure  similar  to  the  observed 
network,  but  does  not  capture  many  of  the  longer  edges. 


4-1.  Stochastic  Effects  Observed  in  the  Simulated  Biased  Levy  walk  Network  (SBLN) 

Implicit  in  the  model  is  a  degree  of  stochasticity  intended  to  capture  the  gross  features  of  human  move¬ 
ment.  In  particular,  the  jump  length  and  direction  choice  are  sampled  from  probability  distributions,  and 
the  directional  bias  is  determined  by  the  (inherently  stochastic)  current  rivalry  structure.  These  elements 
affect  the  inclusion  and  exclusion  of  rivalry  network  edges.  To  understand  the  effect  of  stochasticity  on  the 
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network  produced  by  the  model,  each  simulation  was  run  100  times  with  different  random  seed  values  with 
the  same  SBLN  parameter  values.  We  refer  to  the  collection  of  runs  as  the  Ensemble  SBLN.  Each  simulation 
was  run  independently  and  evaluated  with  several  metrics.  The  resulting  metrics  were  then  averaged  for 
analysis. 

We  also  recorded  the  persistence  of  each  edge  in  the  ensemble  of  networks,  and  this  is  denoted  as  the 
percent  edge  agreement.  For  example,  an  ensemble  network  with  10%  edge  agreement  refers  to  a  network 
consisting  of  all  edges  that  appear  in  at  least  10%  of  the  runs.  Figure  6  displays  the  Ensemble  SBLN 
with  100%,  50%,  and  1%  edge  agreement  next  to  the  observed  rivalry  network.  As  expected,  increasing  the 
percent  edge  agreement  decreases  the  number  of  edges  present  in  the  network.  The  network  constructed  with 
100%  edge  agreement  does  not  give  a  close  representation  of  the  observed  network,  because  there  are  too  few 
edges.  However,  allowing  for  50%  edge  agreement  produces  a  similar  shape  to  the  observed  network.  The 
Ensemble  SBLN  1%  edge  agreement  network  shows  all  possible  edges  observed  in  the  ensemble  of  simulation 
runs.  Taken  together,  these  images  demonstrate  the  stochastic  effects  inherent  in  the  model. 


Figure  6:  Percent  edge  agreement  for  the  ensemble  of  runs  for  the  SBLN  parameter  values.  These  four  images  give  a  comparison 
of,  from  left  to  right,  the  observed  rivalry  network,  the  Ensemble  SBLN  1%  edge  agreement,  the  Ensemble  SBLN  50%  edge 
agreement,  and  the  Ensemble  SBLN  100%  edge  agreement.  Here,  a  node  of  the  network  represents  a  set  space,  and  an  edge 
represents  a  rivalry  between  two  gangs. 


For  comparison,  we  simulated  a  random  model  that  incorporates  only  the  distance  between  nodes.  In 
particular,  we  constructed  a  collection  of  randomly  weighted  Geographical  Threshold  Graphs  by  fixing  the 
locations  of  the  nodes  and  sampling  the  weights,  r)i,  independently  from  a  uniform  distribution.  We  selected 
a  threshold  to  yield  a  median  of  69  edges.  Figure  7  displays  the  percent  agreement  of  each  possible  edge  for 
the  Ensemble  SBLN,  a  collection  of  randomly  weighted  Geographical  Threshold  Graphs,  and  the  Ensemble 
BMN.  For  visualization,  the  edges  for  each  ensemble  were  sorted  separately  in  descending  order  based  on 
percent  edge  agreement.  In  the  Ensemble  SBLN,  there  is  100%  edge  agreement  for  the  existence  of  39  of  the 
edges  (corresponding  to  the  first  39  edges  of  the  Ensemble  SBLN  along  the  horizontal  axis  in  Figure  7).  The 
100%  edge  agreement  network  in  Figure  6  shows  these  edges.  All  runs  in  the  Ensemble  SBLN  consistently 
agree  on  the  nonexistence  of  309  edges  (corresponding  to  the  last  309  edges  of  the  Ensemble  SBLN  in 
Figure  7).  These  are  the  edges  not  appearing  in  the  1%  edge  agreement  network  in  Figure  6. 

The  transition  between  edge  existence  and  nonexistence  in  the  Ensemble  SBLN  is  marked  by  a  steep 
drop  over  58  edges.  The  collection  of  randomly  weighted  Geographical  Threshold  Graphs  displays  a  large 
degree  of  stochasticity  indicated  by  fewer  edges  with  100%  edge  agreement  and  the  more  gradual  decline 
of  edge  agreement.  The  Ensemble  BMN  appears  to  have  a  smaller  degree  of  stochasticity  with  more  edges 
with  100%  edge  agreement  and  a  steeper  decline  than  the  Ensemble  SBLN  and  the  collection  of  randomly 
weighted  Geographical  Threshold  Graphs.  Despite  the  stochasticity  observed  in  these  models,  there  is 
agreement  among  the  edges  of  the  Ensemble  BMN  and  Ensemble  SBLN,  maintaining  some  structure  within 
the  simulated  networks. 

Further  analysis  was  conducted  on  the  stochastic  nature  of  the  proposed  SBLN  model  by  looking  at  the 
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metrics  calculated  on  the  network  produced  with  varying  levels  of  percent  edge  agreement.  Plots  of  the 
accuracy,  density,  and  nodal  degree  variance  for  the  collection  of  randomly  weighted  Geographical  Threshold 
Graphs,  the  Ensemble  SBLN,  and  the  Ensemble  BMN  are  displayed  in  Figure  8.  A  definition  of  these  metrics 
can  be  found  in  Subsection  4.3.  As  the  percent  edge  agreement  increases,  the  accuracy  metric  generally 
increases  for  the  randomly  weighted  Geographical  Threshold  Graphs.  However,  the  Ensemble  SBLN  and 
Ensemble  BMN  do  not  show  much  variation,  though  it  is  notable  that  the  Ensemble  SBLN  consistently 
has  higher  accuracy  across  percent  edge  agreements.  For  the  density  metric,  as  the  percent  edge  agreement 
increases,  the  density  decreases  as  expected.  The  randomly  weighted  Geographical  Threshold  Graphs  have 
the  greatest  variation  in  the  density  metric  and  nodal  degree  variance  metric.  This  implies  that  there  is  a 
greater  degree  of  stochasticity  for  this  model.  Alternatively,  more  structure  is  seen  in  the  Ensemble  SBLN 
and  Ensemble  BMN.  The  two  ensembles  show  small  variations  in  the  density  and  nodal  degree  variance 
metrics  for  percent  edge  agreements  between  15  and  75.  Note  that  the  Ensemble  BMN  has  little  change  in 
nodal  degree  variance  across  all  percent  edge  agreements.  This  indicates  a  lower  degree  of  stochasticity  in 
the  model,  which  agrees  with  the  steep  decline  in  Figure  7. 


CD 


Graph  Edge  Sorted  in  Descending  Order  of  Percent  Agreement 

Figure  7:  Plot  of  the  edge  persistence  for  the  Ensemble  SBLN  (solid),  Ensemble  BMN  (thin-dash),  and  an  ensemble  of  random 
Geographical  Threshold  Graphs  (thick-dash).  The  randomly  weighted  Geographical  Threshold  Graphs  were  constructed  with 
random  weights  and  have  a  median  of  69  edges  present.  The  edges  were  sorted  in  descending  order  according  to  the  proportion 
of  simulation  runs  where  the  edge  is  present  in  the  network.  Each  ensemble  of  runs  were  sorted  separately,  yielding  different 
edge  numbers  among  ensembles. 


4-2.  Long  Term  Behavior  of  the  SBLN 

The  simulated  network,  through  the  movements  of  each  of  the  agents,  evolves  as  the  simulation  progresses. 
Because  of  this  evolution,  it  is  natural  to  ask  if  any  sort  of  steady  state  is  achieved.  Keeping  in  mind  the 
stochasticity  of  the  model  and  the  interaction  between  the  network  and  the  agents’  movements,  an  equilibrium 
in  the  strictest  sense  cannot  be  obtained.  Despite  this,  the  results  indicate  there  is  limiting  behavior  of  the 
observed  metrics  as  the  simulation  progresses.  Figure  9  displays  the  density  and  accuracy  over  the  progression 
of  the  simulations  for  the  Ensemble  SBLN.  In  general,  the  accuracy  metric  we  used  is  a  measure  of  how  close 
the  simulated  network  is  to  the  observed  network.  The  density  of  a  graph  is  proportional  to  the  average 
number  of  rivals  of  all  gangs;  for  further  definitions  of  these  metrics,  refer  to  Subsection  4.3.  Each  run  is 
observed  every  1,000  iterations,  and  the  results  of  each  simulation  are  shown  as  a  thin  line.  The  average 
metric  value  at  each  iteration  is  calculated  and  plotted  as  the  thick  line.  For  visual  investigation,  the  vertical 
axis  on  the  accuracy  plot  has  been  refined  to  include  only  the  area  of  interest.  Accuracy  values  can  range 
from  0  to  1.  Both  of  these  plots  suggest  that  after  a  short  phase  of  initialization,  the  metrics  of  each  run 
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Accuracy  of  the  Network  Resulting  from 
a  Given  Percent  Edge  Agreement 
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Density  of  the  Network  Resulting  from 
a  Given  Percent  Edge  Agreement 


Variance  of  Nodal  Degree  of  the  Network  Resulting 
from  a  Given  Percent  Edge  Agreement 


Figure  8:  Plots  of  the  measured  Accuracy  (Top),  Density  (Middle),  and  Variance  (Bottom)  of  the  networks  constructed  from  a 
given  percent  edge  agreement.  The  metric  values  corresponding  to  the  Ensemble  SBLN  (solid),  Ensemble  BMN  (thin-dash),  and 
an  ensemble  of  random  Geographical  Threshold  Graphs  (thick-dash)  are  plotted  together.  The  randomly  weighted  Geographical 
Threshold  Graphs  were  constructed  with  random  weights  and  have  a  median  of  69  edges  present. 


seem  to  stabilize.  For  the  average  values  of  the  density  and  accuracy  of  the  last  iteration,  refer  to  Tables  2 
and  3.  Further,  we  tracked  the  variance  of  the  metrics  over  the  course  of  the  simulation;  plots  of  the  variance 
for  density  and  accuracy  are  shown  in  Figure  10.  We  observe  that  the  variance  of  the  network  metrics  across 
simulations  levels  out,  indicating  an  appropriate  time  to  terminate  the  simulation. 

The  Ensemble  SBLN  exhibits  stable  long  term  behavior  the  simulated  rivalry  network,  with  some  variation 
due  to  stochasticity.  Despite  this  variation,  the  network  emerging  from  the  model  results  in  metrics  with  a 
small  deviation  from  the  average.  Further,  the  stochasticity  observed  may  provide  a  more  realistic  model  of 
the  true  rivalry  structure.  Research  has  demonstrated  that  the  rivalry  networks  that  link  gangs  tend  to  be 
stable  over  time  [70,  56,  53],  and  that  the  activity  spaces  of  gangs  are  anchored  to  specific  places  [69,  44]. 
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Accuracy  of  Ensemble  SBLN  Networks  at  Each  Iteration 


Density  of  Ensemble  SBLN  Networks  at  Each  Iteration 


Iteration  Number 


Figure  9:  Plots  of  the  accuracy  (top)  and  the  density  (bottom)  of  the  SBLN  over  the  2- 107  iterations.  Each  of  the  100  Ensemble 
SBLN  runs  are  plotted  by  thin  lines.  The  average  over  all  the  runs  at  each  sampled  iteration  is  shown  with  the  solid,  thick 
line.  The  density  of  the  observed  network  is  shown  in  the  thick,  dashed  line.  For  visual  investigation  the  vertical  axis  on  the 
accuracy  plot  has  been  refined  to  include  the  area  of  interest.  Accuracy  values  can  range  from  0  to  1. 


However,  over  longer  periods  of  times,  the  membership  ranks  of  gangs  may  ebb  and  flow  due  to  incarceration, 
individuals  “aging  out”  of  active  status,  or  other  forms  of  incapacitation  [71].  Thus,  gangs  may  lay  dormant 
and,  though  identified  in  the  rivalry  network,  not  actually  participate  in  violence.  In  extreme  cases,  either 
through  high  levels  of  victimization  at  the  hands  of  rival  gangs  or  through  the  focused  enforcement  of  law 
enforcement  agencies,  a  gang  may  simply  disappear  altogether.  As  more  data  become  available,  inherent 
stochasticity  in  the  model  may  allow  for  further  understanding  of  the  rivalry  structure. 

f.3.  Metrics  Used  for  Analysis 

We  analyze  our  model  according  to  several  common  metrics  for  accuracy  and  shape.  Since  we  are 
examining  a  rivalry  network,  there  are  certain  popular  measures  that  are  not  applicable  here.  For  example, 
the  clustering  coefficient  is  often  used  in  social  and  friendship  networks,  describing  the  proportion  of  a  node’s 
neighbors  who  are  also  neighbors  [1,  48,  76].  In  the  analysis  of  gangs  this  metric  is  not  relevant  because 
the  rival  of  a  rival  gang  is  not  necessarily  a  rival.  In  addition,  there  may  be  errors  in  an  observed  rivalry 
network.  With  these  in  mind,  we  carefully  choose  measures  that  are  applicable  to  gang  rivalry  networks 
and  robust  to  small  perturbations  in  the  observed  data.  For  example,  the  degree  distribution  is  commonly 
used  in  social  network  analysis  [79,  72,  48,  50,  49,  1].  In  the  context  of  gangs,  the  degree  distribution  of 
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Variance  of  Accuracy  Metric  of  Ensemble  SBLN  Networks  at  Each  Iteration 
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Variance  of  Density  Metric  of  Ensemble  SBLN  Networks  at  Each  Iteration 
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Figure  10:  Plots  of  the  variance  of  the  accuracy  (top)  and  the  density  (bottom)  measures  of  the  Ensemble  SBLN  over  the  2- 107 
iterations.  The  variances  of  the  100  Ensemble  SBLN  runs  were  calculated  at  1,000  iteration  increments. 


the  network  describes  the  number  of  rivalries  of  each  gang.  Knowing  the  distribution  of  rivalries  at  the  city 
level  may  be  helpful  to  policy  makers  for  determining  how  to  allocate  resources.  The  measures  we  used  are 
motivated  and  defined  in  Sections  4.3.1  and  4.3.2. 

4-3.1.  Accuracy  Metrics 

The  first  measures  of  interest  are  the  raw  values  for  the  number  of  correct  and  incorrect  edges.  These 
values  provide  a  means  for  evaluating  the  performance  of  the  model.  However,  when  comparing  the  observed 
network  with  the  constructed  network,  each  edge  can  be  correct  in  two  ways  and  incorrect  in  two  ways. 
First,  the  constructed  network  can  correctly  identify  an  edge,  true  positive  (TP),  and  correctly  identify  the 
lack  of  an  edge,  true  negative  (TN).  The  constructed  network  can  also  be  wrong  in  two  different  ways.  It 
can  place  an  edge  where  there  is  none,  false  positive  (FP),  and  also  fail  to  place  an  edge  where  there  is  one, 
false  negative  (FN). 

There  are  three  quantities  that  are  of  particular  interest  that  summarize  the  TP,  TN,  FP,  and  FN  values. 
First  is  the  accuracy  of  the  model.  The  accuracy  in  the  context  of  edges  on  a  graph  is  defined  by 

TP  +  TN 

app  _  _  _ _ _  (7) 

TP  +  TN  +  FP  +  FN'  y  ’ 

The  ACC  ranges  between  0  and  1,  with  1  being  a  perfect  reproduction  of  the  observed  network.  This  measure 
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is  proportional  to  the  Qa  measure  discussed  in  [3] .  The  FI  score  provides  another  measure  to  analyze  the 
accuracy  of  the  predicted  network,  [60,  80],  and  is  defined  as 

2  TP 

pi  __  _  /o\ 

2 TP  +  FP  +  FN'  y  ’ 

An  exact  replication  of  the  network  would  have  an  FI  score  of  1.  The  other  summary  statistic  for  the  raw 
closeness  to  the  network  is  the  Matthews  Correlation  Coefficient  (MCC)  [40,  3].  This  measurement  varies 
between  —1  and  1,  where  a  value  of  1  is  a  perfect  prediction.  The  MCC  is  defined  as  follows: 


MCC 


TP -TN  -  FP  -  FN 

y/{ TP  +  FP)(TP  +  FN)(TN  +  FP){TN  +  FN) ' 


(9) 


The  ACC,  FI  score,  and  MCC  provide  a  summary  of  the  TP,  TN,  FP,  and  FN.  These  measures  are  fairly 
robust  to  changes  in  the  observed  data.  For  example  if  one  edge  were  added  or  removed  in  the  observed 
network,  at  most  the  the  ACC  would  change  by  ,  where  N  is  the  number  of  nodes. 

The  measurements  of  the  TP,  TN,  FP,  and  FN  provide  one  means  by  which  to  determine  the  success  of 
the  model.  However,  they  do  not  describe  how  these  correct  or  incorrect  measurements  affect  the  overall 
network  structure.  A  strong  model  would  create  a  network  that  is  not  only  accurate  but  also,  maintains  the 
same  network  structure,  even  in  the  event  that  the  individual  connections  are  the  not  same. 


4-3.2.  Shape  Metrics 

We  would  like  to  verify  that  the  simulated  network  has  a  similar  shape  to  that  of  the  observed  network. 
To  do  this,  we  calculate  the  graph  density,  standard  variance  of  nodal  degree  and  Freeman’s  centrality 
measure  of  the  graph  [76,  21],  Given  N  nodes,  the  density  of  a  graph  is  defined  by 


Edegree(i) 
N{N  -  1)' 


(10) 


In  the  context  of  gangs,  the  degree  of  a  gang  is  equivalent  to  the  number  of  rivals  of  the  gang.  The  density 
of  the  rivalry  network  is  the  average  number  of  rivalries  scaled  by  a  normalization,  .  Networks  with  the 
same  number  of  edges  and  nodes  have  the  same  density  measure.  Further,  this  metric  is  fairly  robust  to 
perturbations  of  the  graph.  For  instance,  if  one  edge  were  added  or  removed,  the  density  metric  would  only 
change  by  N{^_iy 

Two  other  metrics,  the  variance  of  nodal  degree  and  centrality  measure,  give  an  indication  of  the  spread 
of  degrees  among  the  gangs.  The  variance  of  nodal  degree  for  a  graph  is 


N 


E 


(degree(i)  —  aveDegree)2 
N 


The  centrality  measure  of  the  graph  is  defined  to  be 


N 


E 


maxDegree  —  degree(i) 
(N  —  1)(N  —  2) 


(11) 


(12) 


These  measures  provide  summary  statistics  with  which  to  describe  the  general  shape  of  the  graph.  This 
becomes  useful  when  comparing  the  output  to  multiple  models.  For  instance,  if  the  observed  network  has 
a  few  gangs  with  high  degrees  and  the  rest  of  the  gangs  with  low  degrees,  then  this  will  be  reflected  high 
values  in  the  spread  statistics.  A  good  model  should  capture  this  feature. 

The  degree  distribution  has  been  widely  used  to  understand  the  overall  network  structure  [48,  50,  49,  1[. 
We  compare  the  nodal  degree  cumulative  distribution  function  (CDF)  of  our  simulations  with  the  observed 
network. 
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4-4-  Evaluating  Models  using  Graph  Metrics 
4-4-1.  Accuracy  Metric  Results 

Table  2  provides  the  accuracy  measures  for  the  GTG,  BMN,  Ensemble  BMN,  SBLN,  and  Ensemble  SBLN. 
The  SBLN  outperforms  all  of  the  other  networks  on  all  of  the  accuracy  metrics.  Observe  that  the  GTG 
also  performs  well  on  these  metrics.  The  Ensemble  SBLN  metrics  are  comparable  to  the  GTG  and  BMN 
metrics.  In  particular  the  average  number  of  true  negatives  (TN)  and  false  positives  (FP)  perform  slightly 
better  for  the  Ensemble  SBLN  than  for  the  GTG,  BMN,  and  Ensemble  BMN.  The  Ensemble  SBLN  average 
of  the  true  positives  (TP)  and  false  negatives  (FN)  performs  slightly  worse  than  the  GTG  and  BMN.  Only 
the  GTG  and  SBLN  have  higher  accuracy,  FI  Score,  and  MCC  values  than  the  Ensemble  SBLN  average. 


TP 

TN 

FP 

FN 

ACC 

FI  Score 

MCC 

SBLN 

50 

320 

17 

19 

0.9113 

0.7353 

0.6822 

Ensemble  Average 
SBLN  ±  a 

45.50 
±  1.269 

316.1 

±  2.424 

20.90 
±  2.424 

23.50 
±  1.269 

0.8906 

±  0.0077 

0.6722 

±  0.020 

0.6069 

±  0.025 

GTG 

48 

316 

21 

21 

0.8966 

0.6957 

0.6333 

BMN 

47 

313 

24 

22 

0.8867 

0.6714 

0.6031 

Ensemble  Average 
BMN  ±  a 

43.61 
±  1.380 

309.2 

±  1.390 

27.76 

±  1.390 

25.39 
±  1.380 

0.8691 

±  0.0051 

0.6213 

±  0.016 

0.5424 

±  0.019 

Table  2:  Accuracy  measures  for  the  SBLN,  Ensemble  SBLN,  GTG,  BMN,  and  Ensemble  BMN.  The  a  denotes  the  standard 
deviation  of  the  ensemble  metric  values. 


4-4-2.  Shape  Metric  Results 

Table  3  provides  the  shape  measures  for  the  observed  network,  GTG,  BMN,  Ensemble  BMN,  SBLN,  and 
Ensemble  SBLN.  Note  that  the  density  of  the  GTG  is  exactly  the  same  as  the  observed  rivalry  network  by 
construction,  but  it  does  not  perform  well  for  the  nodal  degree  variance.  The  density  for  the  BMN,  Ensemble 
BMN,  SBLN,  and  Ensemble  SBLN  are  all  close  to  the  observed  network.  The  BMN  and  the  Ensemble  BMN 
average  have  the  closest  nodal  degree  variance  to  the  observed  network’s  nodal  degree  variance.  The  centrality 
measure  for  the  SBLN  is  the  closest  to  that  of  the  observed  network. 

The  cumulative  distribution  function  (CDF)  of  nodal  degree  for  the  observed  network,  GTG,  BMN,  nor¬ 
malized  Ensemble  BMN,  SBLN,  and  the  normalized  Ensemble  SBLN  are  shown  in  Figure  11.  A  normalized 
ensemble  CDF  shows  the  CDF  of  the  degree  distribution  of  all  runs  divided  by  the  number  of  runs.  The 
SBLN  and  the  normalized  Ensemble  BMN  have  the  most  similar  distributions  as  the  observed  network.  The 
normalized  Ensemble  SBLN  performs  better  than  the  GTG  and  the  BMN.  In  the  same  figure,  the  normal¬ 
ized  Ensemble  BMN  and  SBLN  are  plotted  with  two  standard  deviations  above  and  below  together  with  the 
observed  network  distribution.  Here  we  see  that  there  is  a  smaller  standard  deviation  for  the  normalized 
BMN  than  the  normalized  SBLN.  Even  with  the  standard  deviations,  the  degree  distributions  of  both  classes 
of  networks  are  close  to  that  of  the  observed  degree  distribution. 

4-5.  Summary  of  Results 

In  all  metrics  except  the  density,  the  SBLN  performs  better  than  the  GTG  (note  that  the  density  measure 
of  the  GTG  is  exactly  the  same  as  the  observed  network  by  construction).  Although  the  GTG  is  unable 
to  closely  replicate  the  standard  shape  measures,  it  has  fairly  high  accuracy  values.  The  Ensemble  SBLN 
average  performs  similarly  to  the  GTG  in  the  accuracy,  but  performs  better  with  shape  measures,  even  with 
the  stochastic  considerations.  On  average,  the  Ensemble  SBLN  produces  a  slightly  more  accurate  degree 
distribution  than  the  GTG.  The  BMN  is  able  to  reproduce  the  degree  distribution  fairly  well,  however, 
the  BMN  and  Ensemble  BMN  average  have  lower  values  for  the  accuracy  (ACC),  Matthews  Correlation 
Coefficient  (MCC),  and  FI  Score  when  compared  to  the  other  models.  Our  analysis  demonstrates  that  the 
SBLN  is  the  strongest  model  in  reproducing  the  observed  rivalry  network. 
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Density 

Variance  of 
Nodal  Degree 

Centrality 

Observed 

0.16995 

4.32105 

0.20106 

SBLN 

0.16503 

3.54578 

0.16799 

Ensemble  Average 
SBLN  ± a 

0.16355 

±  0.005593 

3.66423 
±  0.48395 

0.15040 
±  0.01883 

GTG 

0.16995 

9.97622 

0.27778 

BMN 

0.17488 

3.88585 

0.15741 

Ensemble  Average 
BMN  ±  a 

0.17579 
±  0.004546 

3.93926 
±  0.41351 

0.16065 

±  0.02635 

Table  3:  This  table  provides  the  shape  measures  for  the  observed  network,  SBLN,  Ensemble  SBLN,  GTG,  BMN,  and  Ensemble 
BMN.  The  a  denotes  the  standard  deviation  of  the  ensemble  metric  values.  Note  that  the  density  of  the  GTG  is  exactly  the 
same  as  the  observed  rivalry  network  by  construction. 
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Figure  11:  The  top  figure  plots  together  the  cumulative  distribution  functions  of  the  degree  distribution  for  the  observed 
network  (thick-solid),  GTG  (thick-dashed),  BMN  (dot-dash),  normalized  Ensemble  BMN  (thin-dash),  SBLN  (thin-solid),  and 
normalized  Ensemble  SBLN  (dot-solid).  A  normalized  ensemble  CDF  shows  the  CDF  of  the  degree  distribution  of  all  runs 
divided  by  the  number  of  runs.  The  normalized  Ensemble  BMN  (bottom  left)  and  SBLN  (bottom  right)  are  plotted  with  two 
standard  deviation  above  and  below  (thin-dash)  with  the  observed  network  distribution  (thick-dash). 


5.  Sensitivity  Analysis 

Our  objective  in  this  section  is  to  understand  the  effects  of  the  input  parameters  on  the  system  by 
comparing  the  different  metrics  of  the  resulting  networks  as  the  parameters  change.  Due  to  computational 
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constraints,  we  perform  a  local  analysis  of  the  parameter  space  around  the  SBLN  parameters  specified  in 
column  3  of  Table  1 . 

In  particular,  we  perturb  one  parameter  at  a  time  by  30%  from  the  SBLN  parameter  values  in  10% 
increments.  To  account  for  the  stochasticity  inherent  in  the  model,  each  perturbation  was  run  using  the 
same  25  seed  values  for  the  random  number  generator.  The  range  of  each  parameter  examined  is  listed  in 
Table  4. 


Bounded  Pareto  Scaling  Parameter 

k 

€ 

[0.77  ,  1.43] 

Von  Mises  Scaling  Parameter 

At 

€ 

[2.45  ,  4.55] 

Largest  Maximum  Jump  Length 

A 

€ 

[140  ,  260] 

Smallest  Maximum  Jump  Length 

a 

€ 

[70  ,  130] 

Boundary  Permeability 

B 

€ 

[0.14  ,  0.26] 

Network  Threshold 

T 

€ 

[0.028  ,  0.052] 

Table  4:  Ranges  of  the  parameters  used  in  the  sensitivity  analysis.  Each  parameter  was  changed  30%  from  the  SBLN  parameters 
in  10%  increments.  For  SBLN  parameter  values  refer  to  the  Hollenbeck  column  of  Table  1 


For  each  simulation  run,  we  compute  the  accuracy,  Matthews  Correlation  Coefficient,  FI  score,  centrality 
measure,  variance  of  nodal  degree,  and  density  for  the  resulting  network.  Plots  of  each  combination  of  metric 
versus  parameter  values  were  created  for  the  general  analysis.  Three  examples  of  parameter  and  metric 
combinations  with  more  dramatic  results  are  plotted  in  Figure  12.  In  this  figure,  we  display  the  variance  of 
nodal  degree  versus  the  smallest  maximum  jump  length,  a,  and  the  network  threshold,  T.  We  also  display 
the  density  versus  the  Bounded  Pareto  scaling  parameter,  k.  where  the  vertical  axis  has  been  rescaled  for 
visualization.  The  dots  represent  the  metric  values  of  the  simulation  run  at  the  specified  parameter.  The 
solid  curve  indicates  the  average  metric  value  over  all  runs  at  each  parameter  value. 


Degree  Variance  for  Simulations  with  Perturbed 
Smallest  Maximum  Jump  Length  Values 


Smallest  Maximum  Jump  Length 


Degree  Variance  for  Simulations  with 
Perturbed  Network  Threshold  Values 


Threshold 


Density  for  Simulations  with 
Perturbed  Bounded  Pareto  Scaling  Values 


Bounded  Pareto  Scaling  Parameter 


Figure  12:  Plots  of  the  nodal  degree  variance  versus  the  smallest  maximum  jump  length  (top  left),  and  the  network  threshold 
(top  right).  We  also  display  the  density  versus  the  Bounded  Pareto  scaling  parameter  (bottom),  where  the  vertical  axis  has 
been  rescaled  for  visualization.  The  solid  curve  indicates  the  average  metric  value  over  all  runs  at  each  parameter  value.  The 
dots  represent  the  metric  values  of  the  simulation  run  at  the  specified  parameter. 


As  seen  in  Figure  12,  the  plots  varying  the  network  threshold  and  Bounded  Pareto  scaling  parameters 
have  a  negative  trend  on  average.  The  smallest  maximum  jump  length,  however,  shows  a  positive  trend. 
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The  stochastic  effects  can  also  be  observed  by  the  range  of  metric  values  associated  with  each  parameter 
input,  as  illustrated  by  the  dots  in  Figure  12.  These  plots  suggests  that  stochasticity  may  influence  the 
metric  values  for  a  particular  run,  and  on  average  the  resulting  metric  output  is  sensitive  with  respect  to 
these  parameters. 

These  plots  give  a  view  of  how  the  particular  metric  and  parameter  value  interact.  We  changed  all  of 
the  parameter  values  by  the  same  30%  from  the  SBLN  parameters,  and  so  we  can  compare  plots  with  the 
same  metric.  For  example  in  Figure  12,  we  can  see  that  in  general  nodal  degree  variance  for  the  smallest 
maximum  jump  length  has  a  steeper  trend  than  the  nodal  degree  variance  for  the  threshold,  but  we  can  not 
compare  the  trend  of  the  nodal  degree  variance  plots  directly  to  that  of  the  density  plot. 

To  compare  the  effects  of  all  the  parameters  on  all  metrics,  we  rescale  the  data  points  to  percent  deviation 
from  the  SBLN  parameter  values.  For  example,  when  considering  the  affects  of  the  Bounded  Pareto  scaling 
parameter,  k,  on  the  density  metric,  we  rescaled  the  observed  data  points 


(fci,  density;)  h* 


ki-k. 


k  , 


density  ^  —  density  < 
density  SBLN 


where  k  SBLN  is  the  SBLN  Bounded  Pareto  scaling  parameter.  Here,  density  SBLN  is  the  average  density  at 
the  k  SBLN  value  for  all  25  runs.  A  line  was  fitted  to  the  rescaled  data  points,  and  the  slope  of  this  line  was 
recorded.  This  process  was  repeated  for  each  parameter  and  metric  value  combination. 

The  results  are  recorded  in  Table  5  and  visualized  in  Figure  13.  In  Table  5,  negative  values  indicate  a 
negative  slope  of  the  best  fit  line  to  the  scaled  data,  and  positive  values  indicate  a  positive  slope.  Slopes 
with  a  greater  magnitude  indicate  a  stronger  correlation  between  the  metric  and  parameter.  To  get  a  clearer 
impression  of  overall  sensitivity  of  the  system,  this  information  is  displayed  in  Figure  13.  The  dark,  and 
light,  intensities  of  the  color  map  represent  large  positive,  and  negative,  values  of  the  best  fit  line  slope. 


k 

K 

A 

a 

B 

T 

Accuracy 

-0.0120 

-0.0031 

0.0011 

0.0001 

0.0031 

0.0000 

MCC 

-0.2066 

-0.0161 

0.0023 

0.1458 

0.0293 

-0.0000 

FI  Score 

-0.2149 

-0.0146 

0.0018 

0.1562 

0.0278 

-0.0000 

Centrality 

-0.1705 

-0.0100 

-0.0131 

0.7119 

0.0195 

-0.1751 

Nodal  Degree  Variance 

-0.4385 

-0.0489 

-0.0146 

0.9456 

0.0154 

-0.5412 

Density 

-0.7410 

-0.0080 

-0.0114 

0.6131 

0.0640 

-0.2460 

Table  5:  Slope  of  the  best  fit  to  the  rescaled  data  for  each  metric  and  parameter  combination.  For  reference,  coefficients  that 
correspond  to  the  images  in  Figure  12  are  highlighted  in  bold  font.  Figure  13  displays  this  information  in  a  color  map. 


In  general,  the  metrics  are  not  very  sensitive  to  the  von  Mises  parameter,  k,  the  largest  maximum  jump 
length,  A,  and  the  boundary  permeability,  B,  within  the  parameter  space  investigated.  On  the  other  hand, 
the  Bounded  Pareto  scaling  parameter,  k,  the  smallest  maximum  jump  length,  o,  and  the  network  threshold, 
T,  have  the  most  influence  on  the  metrics.  As  seen  in  the  table  and  figure,  the  accuracy  measures  are  fairly 
robust  to  changes  in  all  parameter  values.  Further,  note  that  nodal  degree  variance  and  density  measures 
appear  to  be  the  most  affected  by  the  changes  in  these  parameters. 

The  Bounded  Pareto  scaling  parameter  values  result  in  negative  slopes  for  all  metrics.  This  is  to  be 
expected  because  an  increase  in  the  Bounded  Pareto  scaling  parameter  will  decrease  the  likelihood  of  larger 
jumps  and  result  in  fewer  edges.  This  phenomena  is  particularly  evident  in  the  density  metric.  Also  this 
parameter  appears  to  have  the  most  effect  on  the  accuracy  measures,  in  particular  the  MCC  and  FI  score. 

Increasing  the  network  threshold  parameter  also  has  a  negative  effect  on  the  shape  metrics.  By  increasing 
the  network  threshold,  the  number  of  connections  decreases.  This  in  turn  decreases  the  density,  nodal  degree 
variance,  and  centrality.  On  the  other  hand,  increasing  the  smallest  maximum  jump  length  increases  the 
connectivity  of  the  network  by  allowing  for  larger  jumps  in  areas  of  high  road  density.  The  effect  of  changing 
this  parameter  is  more  significant  than  changing  the  largest  maximum  jump  length.  Interestingly,  as  the 
largest  maximum  jump  length  increases,  the  connectivity  decreases.  This  could  be  where  attempts  to  cross 
boundaries  are  more  likely  to  occur.  The  lower  portion  of  Hollenbeck  is  approximately  300  units  wide  and 
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Metrics  versus  Parameters  Sensitivity  Color  Map 


Figure  13:  Slopes  of  the  best  fit  line  to  the  rescaled  data  for  each  of  the  parameter  and  metric  combinations  depicted  in  a 
color  map.  The  parameters  varied  include  the  Bounded  Pareto  scaling  parameter,  k ,  the  von  Mises  scaling  parameter,  Kappa, 
the  largest  maximum  jump  length,  A,  the  smallest  maximum  jump  length,  a,  the  boundary  permeability,  B,  and  the  network 
threshold,  T.  The  scale  to  the  right  of  the  image  gives  the  slope  values.  Tones  close  to  the  center  of  the  scale  represent 
combinations  where  the  metrics  are  not  very  sensitive  to  the  respective  parameter.  Combinations  with  tones  at  the  ends  of 
the  spectrum  (black  and  white)  represent  metrics  that  are  sensitive  to  the  respective  parameter.  The  numerical  values  are  also 
stored  in  Table  5. 

has  many  boundaries.  When  varying  the  largest  maximum  jump  length  between  140  to  260,  it  becomes  very 
probable  that  at  least  one  boundary  cross  would  be  attempted.  At  this  point,  the  boundary  permeability  is 
expected  to  play  a  stronger  role  in  the  simulation. 

Depending  on  the  network,  changes  in  the  number  of  connections  could  be  more  or  less  beneficial  in 
terms  of  accuracy.  Further,  small  changes  in  the  connectivity,  i.e.  the  existence  or  non-existence  of  an  edge, 
could  have  small  effects  on  the  accuracy  measures  and  large  effects  on  the  shape  measures,  as  seen  for  our 
simulations  in  the  case  of  the  network  threshold  parameter. 

6.  Discussion 

Using  biased  truncated  Levy  walks  with  semi-permeable  boundaries,  we  have  designed  an  agent-based 
model  for  gang  members  that  incorporates  quasi-realistic  movement  rules  as  well  as  physical  geographic 
features  existing  in  Hollenbeck.  We  have  shown  that  it  is  able  to  simulate  a  gang  rivalry  network  similar 
to  the  one  observed  in  [71,  56].  The  Simulated  Biased  Levy  walk  Network  (SBLN),  the  Brownian  Motion 
Network  (BMN),  and  an  instance  of  a  Geographical  Threshold  Graph  (GTG)  were  compared  to  the  observed 
rivalry  network  using  measures  of  accuracy  and  shape. 

In  choosing  the  metrics  for  the  analysis  of  the  models,  we  took  into  account  that  the  observed  network 
may  not  represent  the  actual  set  of  rivalries  in  a  given  region.  If  the  true  rivalry  network  were  only  a  small 
perturbation  from  the  observed  network,  then  ideally,  we  would  like  the  metrics  to  only  vary  slightly.  For 
this  reason,  we  evaluated  these  models  with  the  accuracy  and  shape  metrics  presented.  The  accuracy  metrics 
provide  a  means  to  determine  whether  the  simulations  are  recreating  the  same  rivalries  that  are  observed. 
The  variance  of  nodal  degree  and  centrality  measures  indicate  the  spread  in  degrees  for  each  gang.  These 
measures  could  be  used  to  help  policy  makers  know  how  to  distribute  resources.  For  example,  if  one  gang 
has  several  rivals  and  the  rest  of  the  gangs  only  have  one,  then  the  resources  should  be  mainly  directed 
towards  the  one  gang  that  is  highly  centralized  within  the  network.  Alternatively,  if  all  gangs  have  the  same 
number  of  rivals,  then  resources  should  be  spread  more  equally  through  the  area.  The  density  metric  gives 
an  indication  as  to  the  connectivity  of  the  network.  In  terms  of  gang  rivalries,  a  higher  density  indicates  that 
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a  larger  number  of  rivalries  are  present  in  the  system.  Creating  a  model  that  produces  a  network  with  similar 
shape  metrics  is  desirable,  since  the  connectivity  and  degree  distributions  could  be  useful  for  determining 
intervention  strategies  and  methods  for  implementation. 

We  implemented  simpler,  baseline  models  so  that  we  could  contrast  them  with  the  SBLN  model.  The 
baseline  models  did  not  perform  as  well  as  the  SBLN,  but  they  did  provide  some  insight  into  the  modeling 
of  gang  rivalry  networks.  The  GTG  is  a  simple  model  designed  to  compare  with  the  agent-based  models. 
This  method  performs  well  on  the  accuracy  metrics  and  provides  an  alternative,  computationally  inexpensive 
method  to  construct  the  rivalry  network.  One  could  extend  this  model  to  incorporate  boundary  information 
by  increasing  the  distance  function  d(ni,rij)  if  rii  and  rij  are  in  distinct  regions,  see  Section  3.1.  The  GTG 
model  indicates  that  geography  and  size  of  gangs  matter  for  the  rivalry  structure.  However,  the  GTG  is 
limited  to  reproducing  only  the  rivalry  network  and  does  not  lend  itself  to  understanding  other  phenomena, 
such  as  the  gang  member  mobility  and  the  locations  of  interactions  between  gang  members.  It  is  not  obvious 
how  to  extend  the  modeling  framework  of  a  GTG  to  include  policing  strategies,  the  location  of  violence, 
retaliatory  behavior,  and  effects  of  injunctions,  unlike  agent-based  models. 

The  BMN  is  a  simplified  version  of  the  SBLN  model.  Although  the  BMN  accuracy  results  were  not 
as  strong  as  the  GTG  and  SBLN  results,  this  method  was  able  to  reproduce  a  similar  shape  as  the  ob¬ 
served  rivalry  network.  This  model  incorporated  geographical  features,  but  ignored  directional  decisions  of 
the  agents.  The  presence  of  semi-permeable  boundaries  in  this  model  reduced  the  number  of  connections 
between  regions,  giving  the  simulated  network  a  more  similar  shape  to  the  observed.  This  suggests  that 
the  incorporation  of  boundaries  plays  an  important  role  in  the  replication  of  the  observed  network.  This  is 
corroborated  in  [56].  In  the  absence  of  the  directional  decisions  of  the  agents,  we  see  a  fundamental  difference 
between  the  SBLN  and  the  BMN.  Without  directional  decisions  the  BMN  has  too  many  high  degree  nodes. 
Another  major  problem  with  the  BMN  model  is  that  the  stopping  criterion  for  the  model  was  artificial,  in 
that  we  chose  to  stop  it  at  the  observed  peak  in  accuracy.  In  general,  there  may  not  be  an  observed  network, 
and  so  it  would  be  difficult  to  determine  stopping  criterion.  Unlike  the  BMN,  our  proposed  SBLN  model 
exhibits  long  term  stabilization  of  the  accuracy  and  density  metrics.  This  is  a  direct  consequence  of  the 
directional  decisions  of  the  agents.  The  need  for  incorporating  directional  decisions  into  a  mobility  model  is 
consistent  with  the  literature  [23]. 

The  SBLN  is  the  best  model  in  replicating  the  observed  network.  It  allows  for  easy  incorporation 
of  geographic  features  and  alternate  movement  dynamics,  while  maintaining  a  high  level  of  accuracy  and 
allowing  for  evolution  in  the  observed  system.  This  model  is  admittedly  more  complex  than  the  other  models, 
however,  it  provides  better  results  in  terms  of  accuracy  and  shape  of  the  networks.  The  stochasticity  in  the 
model  is  beneficial  not  only  to  estimating  the  expected  resulting  network,  but  also  potential  networks  that 
could  arise  from  the  seemingly  random  movements  of  individuals.  One  important  feature  is  that  this  model 
produces  stable  long  term  behavior,  meaning  there  is  a  reasonable  point  at  which  to  stop  the  number  of 
iterations.  This  has  practical  consequences  when  the  true  rivalry  network  is  not  known.  In  the  absence 
of  an  observed  network  with  which  to  calibrate  the  parameters  of  the  model,  various  parameters  could 
be  initialized  using  data  when  available.  There  has  been  some  work  done  on  approximating  the  probability 
density  function  for  jump  lengths  in  a  given  region.  Using  similar  methodology  as  [30],  one  could  approximate 
the  parameters  associated  with  this  region.  In  addition,  an  analysis  of  traffic  flow  under  highways  and  within 
regions  could  be  used  to  approximate  the  boundary  cross  probability.  This  would  greatly  reduce  the  number 
of  variables  of  the  model  and  would  be  an  interesting  avenue  for  future  study. 

Our  modeling  framework  allows  for  a  variety  of  data  output,  depending  on  the  questions  of  interest.  For 
example,  our  model  is  able  track  the  location  of  the  agents’  interactions  during  the  simulation.  This  can  be 
compared  to  violence  data  for  the  Hollenbeck  area,  and  preliminary  work  has  been  done  in  this  direction. 
Using  this  violence  data  would  provide  an  additional  data  set  to  use  for  testing  the  external  validity  of  our 
model.  The  issues  of  internal  and  external  credibility  are  discussed  in  [5].  Figure  14  shows  the  locations  of 
the  interactions  among  agents  for  one  of  the  Ensemble  SBLN  simulation  runs  and  the  density  estimation 
of  gang-related  violent  crimes  in  Hollenbeck  from  1998  through  2000.  The  juxtaposition  of  these  two  plots 
emphasizes  the  similarities  between  the  two  and  illustrates  the  potential  predictive  capabilities  of  this  kind 
of  approach.  Though  movement  and  interaction  rules  may  need  to  be  slightly  altered  to  provide  a  closer 
match  to  the  data,  the  current  model  provides  a  baseline  model  for  further  analysis  and  investigation  of  the 
gang  rivalry  violence  in  Hollenbeck. 

This  highly  flexible  model  provides  a  framework  with  which  to  test  sociological  theories  related  to  gang 
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activities.  One  question  of  great  interest  to  social  scientists  is  the  role  of  territories  in  the  motivation  for  gang 
violence  [71].  One  could  encode  territories  into  the  model  by  having  each  agent  place  a  marker,  associated 
with  its  gang,  on  the  locations  where  it  has  been.  By  leaving  these  markings,  one  could  see  territories  begin 
to  form.  Once  these  territories  were  established,  the  behavioral  rules  could  be  changed  to  avoid  or  attack 
the  territories  of  gangs,  instead  of  one  point  in  space.  Prom  this,  social  scientists  would  be  able  to  play  out 
various  scenarios  and  test  hypotheses.  Another  interesting  phenomenon  observed  is  the  presence  of  alliances 
or  truce  between  gangs,  as  has  been  observed  in  Chicago  [7]  and  Los  Angeles  [71,  45].  The  current  model 
does  not  account  for  the  difference  between  positive,  negative,  and  neutral  interactions.  Instead  the  SBLN 
records  interactions  between  agents  with  the  implicit  assumption  that  these  are  negative  interactions.  A 
known  truce  between  two  gangs  could  be  incorporated  into  the  model  by  flagging  that  pair  of  gangs  as  allies 
with  negative  values  to  the  corresponding  element  of  the  gang  rivalry  matrix  R.  Then,  all  future  interactions 
between  gangs  with  an  alliance  would  not  influence  the  movement  decisions  for  agents  of  these  gangs.  Our 
model  could  also  be  used  to  examine  this  question. 

Pursuing  a  model  that  accurately  describes  the  violent  behavior  in  Hollenbeck  is  of  great  value,  since  Hol¬ 
lenbeck  is  one  of  the  most  violent  areas  in  Los  Angeles  [71,  28].  There  are  several  advantages  of  approaching 
this  serious  problem  using  a  mathematical  model.  One  major  strength  over  a  network  or  statistical  approach 
is  that  once  the  model  has  been  sufficiently  calibrated,  it  provides  a  powerful  tool  for  social  scientists  to  test 
theories  and  hypotheses.  The  paper  [25]  highlights  that  correlation  does  not  imply  causality,  and  though 
statistical  analysis  provides  important  contributions  to  social  science  as  a  field,  it  falls  short  in  terms  of 
identifying  underlying  mechanisms  and  testing  causation  hypotheses.  Though  one  could  argue  that  creating 
an  experiment  would  help  validate  or  reject  causal  arguments,  such  experiments  are  expensive  and  may 
bring  up  ethical  concerns.  Finally,  if  the  simulation  can  accurately  model  the  social  phenomena  of  interest, 
then  we  gain  insight  into  how  intervention  strategies  could  alter  the  existing  gang  rivalry  system.  The  costs 
of  implementing  these  changes  in  the  simulation  are  small  compared  to  those  costs  of  public  funds  needed 
to  implement  experimental  interventions.  If  the  Hollenbeck  area  can  be  well  understood  by  this  approach, 
there  may  be  hope  in  understanding,  and  potentially  mitigating,  other  areas  of  intense  violent  behavior. 


Figure  14:  Locations  of  all  the  interactions  between  agents  for  one  of  the  Ensemble  SBLN  runs  (left).  Density  map  of  gang- 
related  violent  crimes  in  Hollenbeck  between  1998  and  2000  (right). 
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